Environment Setup Guide (2)

By Hongyu Xiao

Contact: hongyu.xiao@ou.edu

Environment Setup Guide for Research Computing

This guide provides instructions for setting up your research computing environment. Whether you're a new researcher, student, or staff member, proper environment configuration is crucial for efficient computational work.

Purpose and Objectives

The main objectives of this environment setup are:

By following this guide, you'll have a fully functional research computing environment that meets both your immediate needs and supports future scalability.

Environment Setup Guide

Configure your working environment:

Here are examples of using squeue and grep to monitor jobs:

# View all jobs in the queue
$ squeue
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
123456 disc_dual python_tr hongyux  R    2:30:15      1 node001
123457 disc_dual tensor_jo  user2   R   12:45:22      1 node002
123458 disc_dual pytorch_t  user3  PD    0:00:00      1 (Resources)

# Filter jobs on disc partitions
$ squeue | grep disc
123456 disc_dual python_tr hongyux  R    2:30:15      1 node001
123457 disc_dual tensor_jo  user2   R   12:45:22      1 node002
123458 disc_dual pytorch_t  user3  PD    0:00:00      1 (Resources)
123459 disc_a100 train_ml  user4   R    5:12:33      1 node003

The output shows job ID, partition name, job name, user, status (R=running, PD=pending), runtime, number of nodes, and node assignment or reason for pending.

Advanced SLURM Usage Tips

Here are some additional SLURM commands and features that can help you manage your computational jobs more effectively:

1. Job Dependencies

You can make jobs wait for other jobs to complete:

# Wait for job 123456 to complete before starting
sbatch --dependency=afterok:123456 next_job.sh

# Wait for job 123456 to fail before starting
sbatch --dependency=afternotok:123456 cleanup_job.sh

2. Resource Monitoring

Monitor your job's resource usage:

# View detailed job information
sacct -j JobID --format=JobID,JobName,MaxRSS,Elapsed

# Monitor memory usage of running job
sstat --format=AveCPU,AveRSS,AveVMSize --jobs JobID

Conda Setup

OSCER could load miniconda to create separate research environment for your need, for machine learning practice, different python package might be in need. Here is an example of setting up an environment.

# Step 1: Download and Install Miniconda
module load Miniconda3

# Step 2: Verify Miniconda Installation
conda --version

# Step 3: Create a New Environment with a Specific Python Version
# Replace X.Y with the desired Python version (e.g., 3.9, 3.10, 3.11)
conda create --name myproject python=X.Y

# Step 4: Activate the New Environment
conda activate myproject

# Step 5: Verify Python Version
python --version

# Optional: Install Specific Packages
conda install numpy pandas matplotlib

# To deactivate the environment when done
conda deactivate

# Useful Additional Commands:
# List all environments
conda env list

# Remove an environment
conda env remove --name myproject

You could also install conda in your desired location,

Visit https://docs.conda.io/en/latest/miniconda.html to download the appropriate installer

CUDA/Seisbench Setup

For CUDA, you could use module load to load any module you wanted including CUDA

By default, module load CUDA will load the most recent version of CUDA, for example

[hongyux@schooner3 ~]$ module spider cuda


--------------------------------------------------------------------------------------------
  CUDA:
--------------------------------------------------------------------------------------------
    Description:
      CUDA (formerly Compute Unified Device Architecture) is a parallel computing platform
      and programming model created by NVIDIA and implemented by the graphics processing
      units (GPUs) that they produce. CUDA gives developers access to the virtual
      instruction set and memory of the parallel computational elements in CUDA GPUs.

     Versions:
        CUDA/5.5.22-GCC-4.8.2
        CUDA/7.5.18-GCC-4.9.3-2.25
        CUDA/7.5.18
        CUDA/8.0.44-GCC-4.9.3-2.25
        CUDA/8.0.44-intel-2016a
        CUDA/8.0.61_375.26-GCC-5.4.0-2.26
        CUDA/9.1.85-GCC-6.4.0-2.28
        CUDA/9.2.88
        CUDA/10.1.105-GCC-8.2.0-2.31.1
        CUDA/10.1.243-GCC-8.3.0
        CUDA/11.0.2-GCC-9.3.0
        CUDA/11.1.1-GCC-10.2.0
        CUDA/11.3.1
        CUDA/11.5.0
        CUDA/11.7.0
        CUDA/11.8.0
        CUDA/12.0.0
        CUDA/12.1.1
        CUDA/12.2.0
        CUDA/12.3.0

In this scenario , if you type module load CUDA , you will be getting the following:

[hongyux@schooner3 ~]$ module load CUDA
[hongyux@schooner3 ~]$ module list

Currently Loaded Modules:
  1) binutils/2.38   2) M4/1.4.18   3) flex/2.6.4   4) CUDA/12.3.0

💡

Seisbench is compatible with CUDA but please be very careful about the CUDA version and seisbench version.

Here is an example of installing seisbench

# SeismoBench Installation Methods

# 1. Using pip (recommended for most users)
# Install a specific version
pip install seisbench==0.1.0

# Install the latest version
pip install seisbench

# 2. Conda Environment Installation
conda create -n seisbench python=3.9
conda activate seisbench
pip install seisbench

# 3. Additional Dependencies for Full Functionality
pip install torch torchvision torchaudio
pip install numpy pandas matplotlib

Here is an example of show seisbench version

(TL) [hongyux@schooner3 ~]$ pip show seisbench
Name: seisbench
Version: 0.7.0
Summary: The seismological machine learning benchmark collection
Home-page: 
Author: 
Author-email: Jack Woolam <jack.woollam@kit.edu>, Jannes Münchmeyer <munchmej@gfz-potsdam.de>
License: GPLv3
Location: /home/hongyux/.conda/envs/TL/lib/python3.12/site-packages
Requires: bottleneck, h5py, nest-asyncio, numpy, obspy, pandas, scipy, torch, tqdm

And seisbench has a good release page on Github. Please do take advantage of it https://github.com/seisbench/seisbench/releases

Check the seisbench version and CUDA version compatibility first if your code is not running.